Skip to content

[fix](function) Improve numerical robustness of cosine_distance / cosine_similarity#62840

Merged
yiguolei merged 7 commits into
apache:masterfrom
yx-keith:Enhance-numerical-stability-for-cosine-distance/similarity
May 28, 2026
Merged

[fix](function) Improve numerical robustness of cosine_distance / cosine_similarity#62840
yiguolei merged 7 commits into
apache:masterfrom
yx-keith:Enhance-numerical-stability-for-cosine-distance/similarity

Conversation

@yx-keith
Copy link
Copy Markdown
Contributor

@yx-keith yx-keith commented Apr 25, 2026

What problem does this PR solve?

Two defensive hardening fixes in CosineDistance::distance and CosineSimilarity::distance to guarantee correct results across the full range of valid float inputs.

Fix 1: Use double-precision intermediate when computing the norm
Before:

return 1 - dot_prod / sqrt(squared_x * squared_y);
After:

const double norm = std::sqrt(static_cast(squared_x) * static_cast(squared_y));
Why: squared_x * squared_y is a float multiplication. When squared_x and squared_y are both large (e.g. input elements around 1e19), the product exceeds FLT_MAX (~3.4e38) and overflows to +inf. Then sqrt(+inf) = +inf and dot_prod / +inf = 0, so two parallel vectors silently get cosine_distance = 1.0 (should be 0.0) — a wrong result with no warning.

For typical L2-normalized embedding vectors this never triggers. But cosine_distance accepts arbitrary float* arrays, not just normalized embeddings, so the function should be safe for any finite float input. The cost is two static_cast ops; double's range (~1.8e308) cannot overflow on any finite float input.

For non-overflow inputs the result is bit-for-bit equivalent (verified by existing tests, which match the same static_cast(34.0 / std::sqrt(14.0 * 83.0)) formula).

Fix 2: Clamp cosine to [-1, 1]
After:

return std::clamp(static_cast(dot_prod / norm), -1.0f, 1.0f);
Why: Float rounding can make the computed cosine slightly exceed 1.0 for identical (or near-identical) vectors. For example with x = y = (0.1f, 0.2f, 0.3f), accumulation rounding can yield cosine = 1.0000001, then 1.0f - cosine = -1e-7 — a negative cosine_distance that violates the metric contract d >= 0 and may break downstream code (DCHECK(distance >= 0), threshold filters, distance aggregation).

std::clamp is a one-op guarantee that costs nothing for in-range values.

BE UT 编译失败修复(独立于本 PR 余弦修复)
问题:merge 最新 master 后,BE UT 在 functions_geo_test.cpp:375 编译失败:

error: static assertion failed: assert_cast is redundant for the same type
'!std::is_same_v<ColumnVector<TYPE_BOOLEAN>, ColumnVector<TYPE_BOOLEAN>>'
根因:master 上三个 PR 叠加副作用:

#63491 把 _null_map 改为强类型 ColumnUInt8::WrappedPtr
#63059 加 static_assert 拒绝 same-type assert_cast
#63049(5/26 刚合入)新写的 geo 测试还按老接口加了冗余 cast
修复:去掉冗余包装

  • assert_cast<ColumnUInt8*>(nullable_input->get_null_map_column_ptr().get())->insert_value(0);
  • nullable_input->get_null_map_column_ptr()->insert_value(0);
    get_null_map_column_ptr() 现在直接返回 ColumnUInt8::MutablePtr,->insert_value() 语义不变。

影响:一行改动,仅修复编译报错,不涉及测试语义和余弦相关代码。

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yx-keith yx-keith changed the title [Vector] Enhance numerical stability for cosine distance/similarity [Vector] Enhance numerical stability for cosine distance and similarity Apr 25, 2026
@yx-keith yx-keith force-pushed the Enhance-numerical-stability-for-cosine-distance/similarity branch 4 times, most recently from 88e74e5 to 3e3f967 Compare April 27, 2026 01:46
@yx-keith yx-keith closed this Apr 27, 2026
@yx-keith yx-keith deleted the Enhance-numerical-stability-for-cosine-distance/similarity branch April 27, 2026 02:25
@yx-keith yx-keith restored the Enhance-numerical-stability-for-cosine-distance/similarity branch April 27, 2026 02:25
@yx-keith yx-keith reopened this Apr 27, 2026
Comment thread be/src/common/factory_creator.h Outdated
template <typename... Args> \
static std::shared_ptr<TypeName> create_shared(Args&&... args) { \
return std::make_shared<TypeName>(std::forward<Args>(args)...); \
return std::shared_ptr<TypeName>(new TypeName(std::forward<Args>(args)...)); \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这么改的目的是啥?


private:
std::atomic<Version> current_version;
doris::atomic_shared_ptr<const T> current_version;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这为什么要改?

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Avoid redundant square root calculations in cosine_distance and cosine_similarity while preserving the existing semantics for non-zero vectors. Empty arrays and zero vectors keep their previous return values, and pointer preconditions are asserted instead of silently mapping unexpected internal states to user-visible distance values.

### Release note

None

### Check List (For Author)

- Test: Manual test
    - Ran git diff --check for be/src/exprs/function/array/function_array_distance.cpp. Full build and unit/regression tests were not run because the current worktree is not initialized (.worktree_initialized and thirdparty/installed are missing).
- Behavior changed: No
- Does this need documentation: No
@yx-keith yx-keith force-pushed the Enhance-numerical-stability-for-cosine-distance/similarity branch from d4d459c to 3b84e7a Compare May 15, 2026 10:12
yx-keith added 3 commits May 15, 2026 19:34
- Replace float norm `sqrt(squared_x * squared_y)` with double-precision
  intermediate to prevent float overflow for large-magnitude vectors
  (e.g. 1e19 per element: squared product overflows float to +inf, yielding
  a wrong cosine of 0/NaN instead of 1).
- Clamp the final cosine to [-1, 1] before computing distance to prevent
  tiny negative cosine_distance values caused by floating-point rounding
  (e.g. identical vectors could yield cosine=1.0000001 -> distance=-1e-7).
- Add cosine_distance unit tests covering identical/orthogonal/opposite/zero
  vectors and a known-value case.
- Add large-magnitude regression test (1e19 elements) that would have caught
  the original overflow bug.
Reformat overlong static_cast<void>(...) call to comply with 100-column limit.
yiguolei
yiguolei previously approved these changes May 26, 2026
@yiguolei
Copy link
Copy Markdown
Contributor

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31302 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e94725c6cc4be0a56c56feb726db33e3766af171, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17876	4069	4010	4010
q2	q3	10753	1396	832	832
q4	4687	477	358	358
q5	7564	2272	2121	2121
q6	331	175	135	135
q7	1018	792	634	634
q8	9377	1786	1532	1532
q9	6870	5006	4962	4962
q10	6441	2248	1878	1878
q11	449	276	248	248
q12	698	424	301	301
q13	18176	3413	2804	2804
q14	273	255	252	252
q15	q16	826	794	710	710
q17	959	901	970	901
q18	6815	5753	5454	5454
q19	1218	1250	1091	1091
q20	496	400	269	269
q21	5761	2609	2503	2503
q22	436	361	307	307
Total cold run time: 101024 ms
Total hot run time: 31302 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4376	4308	4270	4270
q2	q3	4558	4945	4331	4331
q4	2138	2241	1437	1437
q5	4448	4318	4847	4318
q6	252	214	141	141
q7	1984	1795	1671	1671
q8	2519	2138	2267	2138
q9	8066	8161	7982	7982
q10	4877	4781	4507	4507
q11	600	425	435	425
q12	798	788	571	571
q13	3275	3721	2998	2998
q14	302	306	291	291
q15	q16	730	739	650	650
q17	1388	1352	1325	1325
q18	7874	7500	6844	6844
q19	1109	1101	1087	1087
q20	2218	2225	1958	1958
q21	5348	4640	4535	4535
q22	520	466	418	418
Total cold run time: 57380 ms
Total hot run time: 51897 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172371 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e94725c6cc4be0a56c56feb726db33e3766af171, data reload: false

query5	4331	646	517	517
query6	358	212	201	201
query7	4225	594	302	302
query8	323	227	221	221
query9	8827	4108	4084	4084
query10	450	344	299	299
query11	5771	2788	2291	2291
query12	184	130	124	124
query13	1264	649	424	424
query14	6118	5498	5226	5226
query14_1	4509	4517	4500	4500
query15	212	202	182	182
query16	982	453	421	421
query17	945	734	581	581
query18	2450	496	352	352
query19	211	200	156	156
query20	134	131	129	129
query21	216	145	124	124
query22	13713	13582	13491	13491
query23	17295	16561	16144	16144
query23_1	16420	16611	16393	16393
query24	7591	1787	1333	1333
query24_1	1321	1330	1349	1330
query25	550	460	414	414
query26	1322	332	170	170
query27	2684	563	347	347
query28	4510	2070	2023	2023
query29	990	623	503	503
query30	310	238	198	198
query31	1130	1077	948	948
query32	86	73	74	73
query33	534	347	282	282
query34	1197	1131	658	658
query35	841	804	683	683
query36	1412	1408	1289	1289
query37	151	105	95	95
query38	3218	3175	3109	3109
query39	934	926	903	903
query39_1	909	922	902	902
query40	239	149	135	135
query41	73	69	70	69
query42	119	112	111	111
query43	332	342	292	292
query44	
query45	221	209	206	206
query46	1095	1235	738	738
query47	2384	2382	2278	2278
query48	412	459	305	305
query49	653	500	409	409
query50	1048	372	255	255
query51	4320	4324	4366	4324
query52	107	106	95	95
query53	262	285	207	207
query54	351	298	283	283
query55	98	95	92	92
query56	325	322	318	318
query57	1438	1434	1324	1324
query58	313	285	280	280
query59	1633	1659	1484	1484
query60	334	335	329	329
query61	192	183	182	182
query62	710	664	599	599
query63	243	203	207	203
query64	2479	854	686	686
query65	
query66	1735	501	391	391
query67	30045	29767	29647	29647
query68	
query69	471	368	315	315
query70	1059	1022	973	973
query71	311	273	259	259
query72	3190	2658	2409	2409
query73	810	744	458	458
query74	5111	4979	4837	4837
query75	2693	2639	2268	2268
query76	2280	1162	780	780
query77	406	420	338	338
query78	12524	12479	11921	11921
query79	1513	1097	777	777
query80	669	559	457	457
query81	450	283	242	242
query82	1350	157	118	118
query83	353	290	247	247
query84	310	151	113	113
query85	936	553	461	461
query86	396	348	328	328
query87	3451	3406	3239	3239
query88	3613	2751	2747	2747
query89	447	392	343	343
query90	1983	185	182	182
query91	182	179	144	144
query92	83	79	74	74
query93	1455	1437	848	848
query94	545	325	292	292
query95	683	386	433	386
query96	1089	825	327	327
query97	2723	2739	2610	2610
query98	236	228	228	228
query99	1186	1167	1006	1006
Total cold run time: 255075 ms
Total hot run time: 172371 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.85% (20902/38817)
Line Coverage 37.40% (197913/529113)
Region Coverage 33.71% (155093/460112)
Branch Coverage 34.69% (67490/194550)

After switching to double-precision norm in cosine_distance/cosine_similarity,
the computed values are slightly more accurate (closer to the mathematical
truth). Update the regression .out expectations accordingly:

- cosine_distance([1,2,3],[3,5,7]): 0.002585053 -> 0.002585113 (true: ~0.0025851140)
- cosine_distance([1.0,2.0,3.0],[4.0,5.0,6.0]): 0.02536809 -> 0.02536815
- cosine_similarity([0.001,0.002],[0.003,0.004]): 0.9838699 -> 0.98387
- cosine_similarity + cosine_distance roundtrip: 0.9999999534439087 -> 1.0
  (clamp guarantees sim + dist = 1.0 by construction)
- table row 3 cosine_similarity: 0.9746319 -> 0.9746318
@yx-keith yx-keith changed the title [Vector] Enhance numerical stability for cosine distance and similarity [fix](function) Improve numerical robustness of cosine_distance / cosine_similarity May 26, 2026
@yx-keith
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31573 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 12b0ce08a9489e0e2a3b8b8605fbcb0f5cfb0429, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17761	4001	4014	4001
q2	q3	10748	1374	823	823
q4	4706	466	350	350
q5	7737	2240	2149	2149
q6	376	178	140	140
q7	964	780	651	651
q8	9372	1759	1584	1584
q9	7017	4990	4940	4940
q10	6462	2214	1877	1877
q11	433	269	248	248
q12	697	421	299	299
q13	18203	3383	2778	2778
q14	264	254	238	238
q15	q16	822	794	705	705
q17	963	1009	970	970
q18	6843	5814	5716	5716
q19	1202	1203	1037	1037
q20	505	417	260	260
q21	5798	2585	2512	2512
q22	433	357	295	295
Total cold run time: 101306 ms
Total hot run time: 31573 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4382	4257	4434	4257
q2	q3	4551	4921	4389	4389
q4	2094	2224	1404	1404
q5	4447	4388	4989	4388
q6	260	200	152	152
q7	1949	1848	1625	1625
q8	2553	2230	2252	2230
q9	8037	8081	7985	7985
q10	4809	4936	4342	4342
q11	594	420	375	375
q12	770	777	546	546
q13	3486	3673	2930	2930
q14	312	318	288	288
q15	q16	725	740	666	666
q17	1374	1349	1373	1349
q18	8024	7424	6738	6738
q19	1097	1094	1064	1064
q20	2248	2212	1983	1983
q21	5328	4666	4543	4543
q22	532	455	398	398
Total cold run time: 57572 ms
Total hot run time: 51652 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172529 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 12b0ce08a9489e0e2a3b8b8605fbcb0f5cfb0429, data reload: false

query5	4332	664	528	528
query6	342	235	198	198
query7	4259	532	301	301
query8	319	247	221	221
query9	8825	4147	4137	4137
query10	446	347	305	305
query11	5709	2630	2211	2211
query12	191	126	125	125
query13	1296	586	453	453
query14	6188	5535	5222	5222
query14_1	4605	4529	4532	4529
query15	217	208	185	185
query16	1039	479	429	429
query17	1157	744	616	616
query18	2730	495	365	365
query19	225	216	173	173
query20	139	136	133	133
query21	215	138	126	126
query22	13797	13587	13408	13408
query23	17308	16495	16294	16294
query23_1	16462	16412	16244	16244
query24	7499	1796	1337	1337
query24_1	1314	1325	1332	1325
query25	582	505	437	437
query26	1314	312	177	177
query27	2729	572	346	346
query28	4437	1983	1966	1966
query29	1067	648	523	523
query30	302	246	201	201
query31	1136	1083	954	954
query32	93	81	76	76
query33	554	367	315	315
query34	1176	1147	661	661
query35	800	829	708	708
query36	1425	1453	1252	1252
query37	151	107	96	96
query38	3220	3180	3072	3072
query39	923	923	891	891
query39_1	867	877	873	873
query40	225	145	128	128
query41	64	61	60	60
query42	108	115	114	114
query43	332	333	291	291
query44	
query45	210	204	196	196
query46	1112	1178	763	763
query47	2343	2360	2253	2253
query48	398	431	297	297
query49	653	492	388	388
query50	944	361	253	253
query51	4433	4376	4242	4242
query52	106	107	94	94
query53	260	288	210	210
query54	317	271	250	250
query55	95	92	87	87
query56	321	317	302	302
query57	1463	1448	1330	1330
query58	299	272	274	272
query59	1614	1713	1503	1503
query60	319	330	307	307
query61	157	185	155	155
query62	698	653	585	585
query63	244	213	206	206
query64	2373	802	624	624
query65	
query66	1665	474	373	373
query67	29898	29684	29545	29545
query68	
query69	458	348	320	320
query70	1062	1068	975	975
query71	303	275	273	273
query72	2987	2671	2394	2394
query73	895	802	439	439
query74	5111	5008	4838	4838
query75	2706	2614	2286	2286
query76	2302	1167	784	784
query77	413	419	345	345
query78	12363	12352	11773	11773
query79	1481	1059	732	732
query80	635	543	459	459
query81	453	279	246	246
query82	1364	160	120	120
query83	359	291	252	252
query84	262	141	112	112
query85	885	539	451	451
query86	395	336	317	317
query87	3484	3405	3254	3254
query88	3640	2755	2756	2755
query89	455	389	342	342
query90	1952	193	193	193
query91	178	169	135	135
query92	79	77	76	76
query93	1497	1407	874	874
query94	532	350	272	272
query95	646	376	444	376
query96	1078	776	341	341
query97	2739	2762	2593	2593
query98	244	225	230	225
query99	1192	1141	1024	1024
Total cold run time: 255220 ms
Total hot run time: 172529 ms

yx-keith added 2 commits May 26, 2026 22:05
After apache#63491 typed _null_map as ColumnUInt8::WrappedPtr,
get_null_map_column_ptr() now returns ColumnUInt8::MutablePtr directly,
making assert_cast<ColumnUInt8*>(...) a same-type cast which is rejected
by the static_assert added in apache#63059. Drop the wrapper.
@yx-keith
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31704 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d650aba8beb6484026442ce652e94b1577a7a234, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17837	4141	4147	4141
q2	q3	10752	1406	817	817
q4	4686	472	350	350
q5	7608	2272	2108	2108
q6	347	179	137	137
q7	955	749	631	631
q8	9411	1717	1532	1532
q9	7153	4988	4977	4977
q10	6446	2216	1908	1908
q11	429	266	246	246
q12	695	426	304	304
q13	18143	3417	2791	2791
q14	268	262	242	242
q15	q16	827	778	711	711
q17	993	955	927	927
q18	6816	5650	5516	5516
q19	1358	1435	1138	1138
q20	563	413	297	297
q21	6011	2796	2601	2601
q22	451	391	330	330
Total cold run time: 101749 ms
Total hot run time: 31704 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4948	4711	5015	4711
q2	q3	4902	5320	4745	4745
q4	2182	2221	1466	1466
q5	4924	4807	4660	4660
q6	227	170	132	132
q7	1849	1815	1577	1577
q8	2581	2014	1945	1945
q9	7513	7438	7515	7438
q10	4740	4660	4212	4212
q11	569	429	376	376
q12	764	745	558	558
q13	3079	3330	2770	2770
q14	268	280	257	257
q15	q16	685	705	635	635
q17	1316	1278	1265	1265
q18	7344	6922	7054	6922
q19	1135	1104	1098	1098
q20	2228	2217	1953	1953
q21	5353	4707	4591	4591
q22	530	452	412	412
Total cold run time: 57137 ms
Total hot run time: 51723 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d650aba8beb6484026442ce652e94b1577a7a234, data reload: false

query5	4321	661	520	520
query6	330	220	200	200
query7	4362	569	299	299
query8	321	233	241	233
query9	8842	4159	4137	4137
query10	441	362	301	301
query11	5793	2628	2264	2264
query12	186	131	124	124
query13	1294	607	447	447
query14	6215	5523	5216	5216
query14_1	4537	4525	4488	4488
query15	217	201	182	182
query16	1046	425	421	421
query17	1167	730	587	587
query18	2554	488	349	349
query19	211	200	157	157
query20	151	136	129	129
query21	220	136	116	116
query22	13813	13753	13403	13403
query23	17461	16595	16160	16160
query23_1	16487	16385	16343	16343
query24	7417	1785	1333	1333
query24_1	1313	1328	1311	1311
query25	583	518	447	447
query26	1306	321	175	175
query27	2700	587	356	356
query28	4461	2032	2026	2026
query29	1042	638	534	534
query30	301	248	213	213
query31	1146	1089	966	966
query32	94	82	74	74
query33	556	365	314	314
query34	1176	1117	662	662
query35	786	804	717	717
query36	1420	1411	1274	1274
query37	159	109	98	98
query38	3222	3176	3119	3119
query39	946	910	935	910
query39_1	877	869	873	869
query40	232	151	129	129
query41	71	69	68	68
query42	115	112	113	112
query43	342	339	296	296
query44	
query45	215	212	201	201
query46	1098	1176	755	755
query47	2382	2359	2234	2234
query48	414	428	303	303
query49	655	510	410	410
query50	1046	358	259	259
query51	4409	4332	4255	4255
query52	108	107	97	97
query53	277	293	221	221
query54	326	280	270	270
query55	96	93	87	87
query56	307	316	332	316
query57	1463	1418	1348	1348
query58	308	277	272	272
query59	1656	1735	1483	1483
query60	339	342	319	319
query61	204	152	155	152
query62	700	660	579	579
query63	238	205	208	205
query64	2389	788	640	640
query65	
query66	1724	489	348	348
query67	29777	29691	28902	28902
query68	
query69	471	350	310	310
query70	1019	1006	972	972
query71	298	262	275	262
query72	3004	2463	2486	2463
query73	850	747	436	436
query74	5184	4966	4836	4836
query75	2707	2625	2283	2283
query76	2291	1138	748	748
query77	402	423	339	339
query78	12344	12560	11894	11894
query79	1489	1059	765	765
query80	662	544	456	456
query81	450	287	249	249
query82	1376	157	118	118
query83	355	290	263	263
query84	270	141	107	107
query85	871	556	497	497
query86	405	345	338	338
query87	3446	3376	3243	3243
query88	3613	2746	2706	2706
query89	447	392	343	343
query90	1969	191	196	191
query91	189	171	144	144
query92	79	89	75	75
query93	1469	1447	849	849
query94	535	361	316	316
query95	669	471	351	351
query96	1119	875	341	341
query97	2746	2748	2581	2581
query98	252	232	238	232
query99	1151	1141	1038	1038
Total cold run time: 255483 ms
Total hot run time: 172035 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.89% (20925/38828)
Line Coverage 37.43% (198078/529216)
Region Coverage 33.73% (155221/460240)
Branch Coverage 34.71% (67546/194627)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.82% (28071/38028)
Line Coverage 57.74% (304806/527860)
Region Coverage 54.86% (254937/464663)
Branch Coverage 56.39% (110154/195353)

@yx-keith yx-keith requested review from morningman and yiguolei May 27, 2026 01:08
@yiguolei
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue in the numerical-stability fix. Existing review threads about factory_creator.h and multi_version.h were reviewed and are not duplicated here.

Critical checkpoint conclusions:

  • Goal and tests: The PR aims to make cosine_distance/cosine_similarity numerically robust for large finite FLOAT inputs and adds BE unit coverage, but the implementation still fails for valid larger values because multiplication/accumulation remains in float before the double norm is used.
  • Scope and clarity: The change is focused, but the fix is incomplete in the main data path.
  • Concurrency and lifecycle: No new concurrency, locking, static initialization, or non-obvious lifecycle risks found.
  • Configuration and compatibility: No new config, persistence, storage format, or FE-BE protocol compatibility concerns found.
  • Parallel paths: The same accumulation issue exists in both cosine_distance and cosine_similarity; the inline comment covers both.
  • Error handling and special conditions: Existing zero/empty handling is preserved; no ignored Status issue found.
  • Test coverage: Tests cover the product-of-squared-norm overflow case, but not overflow during the per-element float multiplication/accumulation itself.
  • Observability, transactionality, memory: Not applicable to this PR.
  • Performance: Widening the accumulators before multiplication is the relevant correctness fix; no separate performance blocker found.

User focus: No additional user-provided review focus was specified.


DCHECK(x != nullptr && y != nullptr);

float dot_prod = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still overflows before the new double norm is computed because dot_prod, squared_x, and squared_y are accumulated as float and the products are evaluated as float. For a valid finite FLOAT input such as cosine_similarity([1e20], [1e20]), x[i] * x[i] and x[i] * y[i] become +inf; then dot_prod / norm is inf / inf, producing NaN (std::clamp does not sanitize NaN). The expected result for parallel vectors is still 1.0 similarity / 0.0 distance. Please widen the accumulators and the per-element products to double before accumulation in both cosine functions, and add a test that covers this overflow-at-accumulation case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch. The per-element overflow only triggers when |x[i]| > sqrt(FLT_MAX) ≈ 1.84e19, which is even more pathological than the outer-product overflow this PR addresses. Widening accumulators to double would halve SIMD throughput (~5-20% end-to-end regression in vector search workloads) while only covering a stricter subset of overflow scenarios that have never been observed in real embeddings. We prefer to keep the current scope; happy to file a follow-up if a real-world case is reported.

@yiguolei yiguolei merged commit 84e47b6 into apache:master May 28, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x dev/4.1.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants